Harvesting question-answer (QA) pairs from customer service chatlog in the wild is an efficient way to enrich the knowledge base for customer service chatbots in the cold start or continuous integration scenarios. Prior work attempts to obtain 1-to-1 QA pairs from growing customer service chatlog, which fails to integrate the incomplete utterances from the dialog context for composite QA retrieval. In this paper, we propose N-to-N QA extraction task in which the derived questions and corresponding answers might be separated across different utterances. We introduce a suite of generative/discriminative tagging based methods with end-to-end and two-stage variants that perform well on 5 customer service datasets and for the first time setup a benchmark for N-to-N DialogQAE with utterance and session level evaluation metrics. With a deep dive into extracted QA pairs, we find that the relations between and inside the QA pairs can be indicators to analyze the dialogue structure, e.g. information seeking, clarification, barge-in and elaboration. We also show that the proposed models can adapt to different domains and languages, and reduce the labor cost of knowledge accumulation in the real-world product dialogue platform.
translated by 谷歌翻译
We present DualNER, a simple and effective framework to make full use of both annotated source language corpus and unlabeled target language text for zero-shot cross-lingual named entity recognition (NER). In particular, we combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework. After obtaining a sufficient NER model trained on the source data, we further train it on the target data in a {\it dual-teaching} manner, in which the pseudo-labels for one task are constructed from the prediction of the other task. Moreover, based on the span prediction, an entity-aware regularization is proposed to enhance the intrinsic cross-lingual alignment between the same entities in different languages. Experiments and analysis demonstrate the effectiveness of our DualNER. Code is available at https://github.com/lemon0830/dualNER.
translated by 谷歌翻译
Contrastive learning has become a new paradigm for unsupervised sentence embeddings. Previous studies focus on instance-wise contrastive learning, attempting to construct positive pairs with textual data augmentation. In this paper, we propose a novel Contrastive learning method with Prompt-derived Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts, we construct virtual semantic prototypes to each instance, and derive negative prototypes by using the negative form of the prompts. Using a prototypical contrastive loss, we enforce the anchor sentence embedding to be close to its corresponding semantic prototypes, and far apart from the negative prototypes as well as the prototypes of other sentences. Extensive experimental results on semantic textual similarity, transfer, and clustering tasks demonstrate the effectiveness of our proposed model compared to strong baselines. Code is available at https://github.com/lemon0830/promptCSE.
translated by 谷歌翻译
大多数当前的多模式摘要方法遵循级联的方式,在该方式中,首先使用现成的对象检测器来提取视觉特征,然后将这些功能与语言表示融合在一起,以使用编码器模型生成摘要。级联的方式无法捕获图像和段落之间的语义一致性,这对于确切的摘要至关重要。在本文中,我们向vil-sum提出了段落级级\ textbf {vi} sion- \ textbf {l} arnguage语义对齐和多模式\ textbf {sum} marization。 VIL-SUM的核心是一个联合多模式编码器,具有两个精心设计的任务,图像重新排序和图像选择。联合多模式编码器捕获了模式之间的交互,重新排序任务指导该模型学习段落级别的语义对齐,而选择任务指导模型在最终摘要中将模型指向所选摘要相关的图像。实验结果表明,我们提出的VIL-SUM显着优于当前最新方法。在进一步的分析中,我们发现两个精心设计的任务和联合多模式编码器可以有效地指导模型学习合理的段落图像和摘要图像关系。
translated by 谷歌翻译
无监督的摘要方法通过纳入预训练的语言模型的表示形式来取得了显着的结果。但是,当输入文档非常长的同时,现有方法无法考虑效率和有效性。为了解决这个问题,在本文中,我们提出了一个基于语义块的无监督长期文档摘要,提议有效的粗到1个方面的排名(C2F-FAR)框架。语义块是指描述相同方面的文档中的连续句子。具体而言,我们通过将一步排名方法转换为层次多范围两阶段排名来解决此问题。在粗级阶段,我们提出了一种新的段算法,将文档拆分为相关的语义块,然后过滤量微不足道的块。在精细阶段,我们在每个块中选择显着句子,然后从选定的句子中提取最终摘要。我们在四个长文档摘要数据集上评估了我们的框架:Gov-Report,Billsum,Arxiv和PubMed。我们的C2F-FAR可以在Gov-Report和Billsum上实现新的无监督摘要结果。此外,我们的方法比以前的方法高4-28倍。
translated by 谷歌翻译
在发展强化学习(RL)培训系统方面取得了重大进展。过去的作品,例如Impala,Apex,Seed RL,样本工厂等,旨在改善系统的整体吞吐量。在本文中,我们试图解决RL训练系统中的常见瓶颈,即平行环境执行,这通常是整个系统中最慢的部分,但很少受到关注。通过针对RL环境的策划设计,我们改善了不同硬件设置的RL环境模拟速度,从笔记本电脑和适度的工作站到NVIDIA DGX-A100等高端机器。在高端机器上,Envpool在Atari环境上的环境执行每秒可实现100万帧,在Mujoco环境上每秒执行300万帧。在笔记本电脑上运行时,Envpool的速度是Python子过程的2.8倍。此外,在开源社区中已经证明了与现有RL培训库的极大兼容性,包括Cleanrl,RL_Games,DeepMind Acme等。最后,Envpool允许研究人员以更快的速度迭代他们的想法,并具有巨大的潜力,并具有巨大的潜力事实上的RL环境执行引擎。示例运行表明,在笔记本电脑上训练Atari Pong和Mujoco Ant只需5分钟即可。 Envpool已经在https://github.com/sail-sg/envpool上开源。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.
translated by 谷歌翻译